collocations              package:none               R Documentation

Extract collocations for a target word from a given raw text.

Description:

  collocations receives a text and a target word and select the sentences 
from the text which contain the target word. From those sentences, the 
co-occurrences between target word and the other words which are above a 
certain threshold will constitue the set of collocations.

Usage:

  collocates(thetext, targetword, ncollmax)

Arguments:

  thetext		character. Text given by the user in .txt 
format and UTF-8 encoding. 

  targetword		character. Any word the user has chosen 
from the text. It will the reference for the extraction of the collocations.  

  ncollmax		numeric. Maximum number of collocates to be 
displayed on the graph generated by the function. In case the number of 
extracted collocates is less than the sitpulated maximum, then ncollmax 
will be ignored.

Details:

  The function may not work well depending on the size of the text file 
given even though some optimizations were tried such as using environments 
hash to count faster the words' occurrences. 

Value:

  Instead of returning values, collocates generates one text file and 
another file for a barplot in png format. Both are saved in the workspace 
being used to run the function.
	
Warning:

  Depending on the size of the text file, the function may get too slow 
or not work. As a suggestion, the usar can exeriment the function with 
different text sizes. See Examples for a simple teste of the function. 

Author:

  Viviane Santos da Silva
  
  viviane.sds90@gmail.com
  viviane.santos.silva@usp.br

References:

  http://en.wikibooks.org/wiki/R_Programming/Text_Processing Last 
access in may 18th 2014.

  About environments and hash argument: 
http://adv-r.had.co.nz/Environments.html (There has been created a hash 
function to optimize the use of hashes, but it only works for later 
versions of R. Read "See Also")

  Download of non-annotaded corpora for testing the function: 
http://corpora.informatik.uni-leipzig.de/download.html Last access 
in may 15th 2014.
  
  To understand a little bit more about collocations in a more 
intuitive way: http://esl.fis.edu/grammar/easy/colloc.htm

See Also:

  For more information on hash usage in R, see: 
http://cran.r-project.org/web/packages/hash/index.html, 
http://cran.r-project.org/web/packages/hash/hash.pdf and http://opendatagroup.wordpress.com/2009/07/26/hash-package-for-r/.

Examples:

  # Download the file "teste-texto-bbc.txt" in (http://ecologia.ib.usp.br/bie5782/doku.php?id=bie5782:01_curso_atual:alunos:trabalho_final:
viviane.santos.silva:start) and save it to your R workspace to run this example.
  
  collocates(thetext="test-text-bbc.txt", targetword="fiction", ncollmax=10) 
# generates a barplot for the 10 first collocates which co-occur 
with the target word "film" in the text given.

